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1 Introduction 

A schema mapping is a specification that describes how 
data from a source schema is to be mapped to a target 
schema. Schema mappings have proved to be essential 
for data-interoperability tasks such as data exchange and 
data integration. The research on this area has mainly fo- 
cused on performing these tasks. However, as Bernstein 
pointed out [7|, many information-system problems in- 
volve not only the design and integration of complex ap- 
plication artifacts, but also their subsequent manipulation. 
Driven by this consideration, Bernstein proposed in [7 1 
a general framework for managing schema mappings. In 
this framework, mappings are usually specified in a logi- 
cal language, and high-level algebraic operators are used 
to manipulate them H71 [TTl l34l [131 l8l . 

Two of the most fundamental operators in this frame- 
work are the composition and inversion of schema map- 
pings. Intuitively, the composition can be described as 
follows. Given a mapping Aii from a schema A to a 
schema B, and a mapping A4-2 from B to a schema E, the 
composition of M.\ and M2 is a new mapping that de- 
scribes the relationship between schemas A and E. This 
new mapping must be semantically consistent with the re- 
lationships previously established by M.\ and A^2- On 
the other hand, an inverse of M.y is a new mapping that 
describes the reverse relationship from B to A, and is se- 
mantically consistent with M.\. 

In practical scenarios, the composition and inversion 
of schema mappings can have several applications. In a 
data exchange context [14|, if a mapping A4 is used to 
exchange data from a source to a target schema, an in- 
verse of M. can be used to exchange the data back to the 
source, thus reversing the application of M.. As a sec- 
ond application, consider a peer-data management system 
(PDMS) JTOj [25j. In a PDMS, a peer can act as a data 
source, a mediator, or both, and the system relates peers 
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by establishing directional mappings between the peers 
schemas. Given a query formulated on a particular peer, 
the PDMS must proceed to retrieve the answers by refor- 
mulating the query using its complex net of semantic map- 
pings. Performing this reformulation at query time may be 
quite expensive. The composition operator can be used to 
essentially combine sequences of mappings into a single 
mapping that can be precomputed and optimized for query 
answering purposes. Another application is schema evo- 
lution, where the inverse together with the composition 
play a crucial role (8) . Consider a mapping M. between 
schemas A and B, and assume that schema A evolves 
into a schema A'. This evolution can be expressed as a 
mapping M! between A and A'. Thus, the relationship 
between the new schema A' and schema B can be ob- 
tained by inverting mapping M.' and then composing the 
result with mapping M. . 

In the recent years, a lot of attention has been paid 
to the development of solid foundations for the compo- 
sition (33] H71 [371 and inversion US [23 g] [3) of schema 
mappings. In this paper, we review the proposals for the 
semantics of these crucial operators. For each of these 
proposals, we concentrate on the three following prob- 
lems: the definition of the semantics of the operator, the 
language needed to express the operator, and the algorith- 
mic issues associated to the problem of computing the op- 
erator. It should be pointed out that we primarily consider 
the formalization of schema mappings introduced in the 
work on data exchange lfl4ll . In particular, when studying 
the problem of computing the composition and inverse of 
a schema mapping, we will be mostly interested in com- 
puting these operators for mappings specified by source- 
to-target tuple-generating dependencies [14|. Although 
there has been an important amount of work about dif- 
ferent flavors of composition and inversion motivated by 
practical applications l9l [331 |39l , we focus on the most 
theoretically-oriented results ll3ll[r7l[T3ll20l l4"ll3l. 

Organization of the paper. We begin in Section 2 with 
the terminology that will be used in the paper. We then 
continue in Section 3 reviewing the main results for the 



composition operator proposed in [17|. Section 4 con- 
tains a detailed study of the inverse operators proposed 
in |[T3l |20l |4l . In Section 5, we review a relaxed approach 
to define the semantics for the inverse and composition 
operators that parameterizes these notions by a query- 
language lf33l [3l. Finally, some future work is pointed out 
in Section 6, and the proofs of the new results presented 
in this survey are given in AppendixlAl 

2 Basic notation 

In this paper, we assume that data is represented in the 
relational model. A relational schema R, or just schema, 
is a finite set {Rx, • ■ • , R n } of relation symbols, with each 
Ri having a fixed arity rij. An instance / of R assigns to 
each relation symbol Ri of R a finite n^-ary relation Rj. 
The domain of an instance /, denoted by dom(7), is the 
set of all elements that occur in any of the relations R\ . In 
addition, Inst(R) is defined to be the set of all instances 
of R. 

As usual in the data exchange literature, we consider 
database instances with two types of values: constants and 
nulls. More precisely, let C and N be infinite and disjoint 
sets of constants and nulls, respectively. If we refer to 
a schema S as a source schema, then Inst(S) is defined 
to be the set of all instances of S that are constructed by 
using only elements from C, and if we refer to a schema 
T as a target schema, then instances of T are constructed 
by using elements from both C and N. 

Schema mappings and solutions. Schema mappings 
are used to define a semantic relationship between two 
schemas. In this paper, we use a general representation of 
mappings; given two schemas Ri and R2, a mapping M 
from Ri to R2 is a set of pairs (I, J), where / is an in- 
stance of Ri, and J is an instance of R2. Further, we say 
that J is a solution for I under M. if (I, J) E M.. The set 
of solutions for / under M is denoted by So\m (I). The 
domain of Ai, denoted by dom(A / (), is defined as the set 
of instances / such that Solx (I) 7^ 0. 

Dependencies. As usual, we use a class of dependen- 
cies to specify schema mappings JT4). Let C\, £2 be 
query languages and Ri, R2 be schemas with no relation 
symbols in common. A sentence $ over Ri U R2 is an 
C1-TO-C2 dependency from Ri to R2 if $ is of the form 
Vx (<p(x) — > 4>{x))i where (1) x is the tuple of free vari- 
ables in both ip(x) and ip(x); (2) <p(x) is an £i-formula 
over Ri; and (3) ip(x) is an /^-formula over R2. Fur- 
thermore, we usually omit the outermost universal quan- 
tifiers from £i-TO-£2 dependencies and, thus, we write 
(fi(x) —> ip(x) instead of Vx (<f(x) — > ip(x)). Finally, the 



semantics of an £i-TO-£2 dependency is defined as usual 
(e.g., see |[T4ll4T). 

If S is a source schema and T is a target schema, 
an £i-TO-£2 dependency from S to T is called an 
£i-TO-£2 source-to-target dependency {C1-TO-C2 st- 
dependency), and an C\-TO-C2 dependency from T to 
S is called an C\-lO-C2 target-to-source dependency 
(Xi-TO-£2 ts-dependency). Notice that the fundamen- 
tal class of source-to-target tuple-generating dependencies 
(st-tgds) (HD corresponds to the class of CQ-TO-CQ st- 
dependencies. 

When considering a mapping specified by a set of de- 
pendencies, we use the usual semantics given by logi- 
cal satisfaction. That is, if M. is a mapping from Ri 
to R2 specified by a set £ of C\-TO-C2 dependencies, 
we have that (I, J) E M if and only if I E Inst(Ri), 
J E Inst(R2), and (J, J) satisfies E. 

Query Answering. In this paper, we use CQ to denote the 
class of conjunctive queries and UCQ to denote the class 
of unions of conjunctive queries. Given a query Q and 
a database instance /, we denote by Q(I) the evaluation 
of Q over /. Moreover, we use predicate C( ) to differ- 
entiate between constants and nulls, that is, C(a) holds 
if and only if a is a constant value. We use =, ^, and 
C as superscripts to denote a class of queries enriched 
with equalities, inequalities, and predicate C(-), respec- 
tively. Thus, for example, UCQ _,C is the class of unions 
of conjunctive queries with equalities and predicate C(-). 

As usual, the semantics of queries in the presence of 
schema mappings is defined in terms of the notion of cer- 
tain answer. Assume that M is a mapping from a schema 
Ri to a schema R2. Then given an instance / of Ri and 
a query Q over R2, the certain answers of Q for I under 
M, denoted by certain Ki (Q, I), is the set of tuples that 
belong to the evaluation of Q over every possible solution 
for / under A4, that is, f]{Q(J) | J is a solution for / 
under M}. 

Proviso. In this survey, only finite sets of dependencies 
are considered. 

3 Composition of Schema Mappings 

The composition operator has been identified as one of the 
fundamental operators for the development of a frame- 
work for managing schema mappings Q [34] [36ll . The 
goal of this operator is to generate a mapping M13 that 
has the same effect as applying successively two given 
mappings M. 12 and A^23. provided that the target schema 
of M.\2 is the same as the source schema of A^23- In 
ATI . Fagin et al. study the composition for the widely 



used class of st-tgds. In particular, they provide solutions 
to the three fundamental problems for mapping operators 
considered in this paper, that is, they provide a formal se- 
mantics for the composition operator, they identify a map- 
ping language that is appropriate for expressing this oper- 
ator, and they study the complexity of composing schema 
mappings. In this section, we present these solutions. 

In IfTTl [34), the authors propose a semantics for the 
composition operator that is based on the semantics of this 
operator for binary relations: 

Definition 3.1 ( Ifl7ll3"4l ) Let M12 be a mapping from a 
schema Ri to a schema R2, and M 23 a mapping from R2 
to a schema R3. Then the composition of M 12 and M23 
is defined as M12 ° M23 = {(h,h) I 37 2 : (h,h) 6 
M12 and {h,h) € ^23}- 

Then Fagin et al. consider in [17| the natural question of 
whether the composition of two mappings specified by st- 
tgds can also be specified by a set of these dependencies. 
Unfortunately, they prove in IfTTl that this is not the case, 
as shown in the following example. 
Example 3.2. (from [17 |) Consider a schema Ri consist- 
ing of one binary relation Takes, that associates a student 
name with a course she/he is taking, a schema R2 consist- 
ing of a relation Takes^ that is intended to be a copy of 
Takes, and of an additional relation symbol Student, 
that associates a student with a student id; and a schema 
R3 consisting of a binary relation symbol Enrollment, 
that associates a student id with the courses this student is 
taking. Consider now mappings M12 and .M23 specified 
by the following sets of st-tgds: 

£12 = {Takes(n, c) — > Takesi (n, c), 

Takes(n, c) — > 3s Student(n, s)}, 
£23 = {Student(n, s) A Takesi (n, c) — > 

Enrollment^, c)}. 

Mapping M12 requires that a copy of every tuple in 
Takes must exist in Takesi and, moreover, that each 
student name n must be associated with some student id 
s in the relation Student. Mapping .M23 requires that 
if a student with name n and id s takes a course c, then 
(s, c) is a tuple in the relation Enrollment. Intuitively, 
in the composition mapping one would like to replace the 
name n of a student by a student id i n , and then for each 
course c that is taken by n, one would like to include the 
tuple (i n ,c) in the table Enrollment. Unfortunately, 
as shown in IfTTl . it is not possible to express this relation- 
ship by using a set of st-tgds. In particular, a st-tgd of the 
form: 

Iakes(n,c) — ¥ By Enrollment(y, c) (1) 



does not express the desired relationship, as it may asso- 
ciate a distinct student id y for each tuple (n, c) in Takes 
and, thus, it may create several identifiers for the same 
student name. □ 

The previous example shows that in order to express 
the composition of mappings specified by st-tgds, one has 
to use a language more expressive than st-tgds. However, 
the example gives little information about what the right 
language for composition is. In fact, the composition of 
mappings M12 and .M23 in this example can be defined 
in first-order logic (FO): 

\/nBy\/c (Takes (n, c) — > Enrollment^, c)), 

which may lead to the conclusion that FO is a good al- 
ternative to define the composition of mappings specified 
by st-tgds. However, a complexity argument shows that 
this conclusion is wrong. More specifically, given map- 
pings Mu = (Ri,R 2 ,Si 2 ) andX 2 3 = (R2,R3,£ 23 ), 
where £12 and £23 are sets of st-tgds, define the 
composition problem for M.12 and M23, denoted by 
COMPOSlTlON(A4i2, 7W23), as the problem of verify- 
ing, given 1\ 6 Inst(Ri) and ^3 E Inst(Rs), whether 
(Ii,l3) £ M12 o M23- If the composition of M12 
with .M23 is defined by a set £ of formulas in some 
logic, then COMPOSITION 12 , M 23) is reduced to the 
problem of verifying whether a pair of instances ^3) 
satisfies £. In particular, if £ is a set of FO formu- 
las, then the complexity of Composition(A^i2, .M23) 
is in LOGSPACE, as the complexity of verifying whether 
a fixed set of FO formulas is satisfied by an instance is 
in LOGSPACE |g0). Thus, if for some mappings M 12 
and .M23, the complexity of the composition problem is 
higher than LOGSPACE, one can conclude that FO is not 
capable of expressing the composition. In fact, this higher 
complexity is proved in IT7l . 

Theorem 3.3 ( 0171 ) For every pair of mappings M.12, 
M23 specified by st-tgds, COMPOSITION (M 12, .M23) is 
in NP. Moreover, there exist mappings M\ 2 and M23 
specified by st-tgds such that COMPOSITION(.M * 2 , M23) 
is NP-complete. 

Theorem 13.31 not only shows that FO is not the right 
language to express the composition of mappings given 
by st-tgds, but also gives a good insight on what needs 
to be added to st-tgds to obtain a language closed under 
composition. Given that COMPOSlTlON(7Wi 2 , .M23) is 
in NP, we know by Fagin's Theorem that the composi- 
tion can be defined by an existential second-order logic 
formula lfT2l [27). In fact, Fagin et al. use this prop- 
erty in [17 1 to obtain the right language for composition. 



More specifically, Fagin et al. extend st-tgds with exis- 
tential second-order quantification, which gives rise to the 
class of SO-tgds |17|. Formally, given schemas Ri and 
R2 with no relation symbols in common, a second-order 
tuple-generating dependency from Ri to R2 (SO-tgd) is a 
formula of the form 3f (\/xi(ifi — >• ipi)A- ■ -AWx n (ip n — > 
ipn) ), where (1) each member of / is a function symbol, 
(2) each formula cpi (1 < i < ri) is a conjunction of rela- 
tional atoms of the form S(y\ , . . . , yk) and equality atoms 
of the form t = t', where S is a fc-ary relation symbol of 
Ri and y\, . . ., yk are (not necessarily distinct) variables 
in Xi, and t, t' are terms built from Xi and /, (3) each for- 
mula ipi (1 < i < ri) is a conjunction of relational atomic 
formulas over R2 mentioning terms built from Xi and /, 
and (4) each variable in x, (1 < i < ri) appears in some 
relational atom of ipi. 

In ATI . Fagin et al. show that SO-tgds are the right de- 
pendencies for expressing the composition of mappings 
given by st-tgds. First, it is not difficult to see that ev- 
ery set of st-tgds can be transformed into an SO-tgd. For 
example, set E 12 from Example 13.21 is equivalent to the 
following SO-tgd: 

3/^VnVc(Takes(n, c) -> Takesi(n, c)) A 

VnVc (Takes(rt, c) — > Student(n, f(n, c)))^ . 

Second, Fagin et al. show that SO-tgds are closed under 
composition. 

Theorem 3.4 ( lH7l ) Let M12 and M23 be mappings 
specified by SO-tgds. Then the composition M12 o M23 
can also be specified by an SO-tgd. 

It should be noticed that the previous theorem can also be 
applied to mappings that are specified by finite sets of SO- 
tgds, as these dependencies are closed under conjunction. 
Moreover, it is important to notice that Theorem 13. 41 im- 
plies that the composition of a finite number of mappings 
specified by st-tgds can be defined by an SO-tgd, as every 
set of st-tgds can be expressed as an SO-tgd. 

Theorem 3.5 (|17|) The composition of a finite number 
of mappings, each defined by a finite set of st-tgds, is de- 
fined by an SO-tgd. 

Example 3.6. Let M12 and M23 be the mappings defined 
in Example l3.2l The following SO-tgd correctly specifies 
the composition of these two mappings: 

3g VtiVc (Takes(n, c) — > Enrollment( i g(n), c)) I . 



□ 

Third, Fagin et al. prove in [17] that the converse of The- 
orem [33] also holds, thus showing that SO-tgds are ex- 
actly the right language for representing the composition 
of mappings given by st-tgds. 

Theorem 3.7 ( II17IP Every SO-tgd defines the composi- 
tion of a finite number of mappings, each defined by a 
finite set of st-tgds. 

Finally, Fagin et al. in ifTTIl also study the complex- 
ity of composing schema mappings. More specifically, 
they provide an exponential-time algorithm that given two 
mappings M.12 and M23, each specified by an SO-tgd, 
returns a mapping M13 specified by an SO-tgd and equiv- 
alent to the composition of M.12 and A^23- Furthermore, 
they show that exponentiality is unavoidable in such an 
algorithm, as there exist mappings M12 and M23, each 
specified by a finite set of st-tgds, such that every SO-tgd 
that defines the composition of M12 and .M23 is of size 
exponential in the size of M12 and AI23. 

In ll37l . Nash et al. also study the composition problem 
and extend the results of ifTTl . In particular, they study the 
composition of mappings given by dependencies that need 
not be source-to-target, and for all the classes of mappings 
considered in that paper, they provide an algorithm that 
attempts to compute the composition and give sufficient 
conditions that guarantee that the algorithm will succeed. 

3.1 Composition under closed world se- 
mantics 

In ll28l . Libkin proposes an alternative semantics for 
schema mappings and, in particular, for data exchange. 
Roughly speaking, the main idea in [28 1 is that when ex- 
changing data with a set E of st-tgds and a source in- 
stance /, one generates a target instance J such that ev- 
ery tuple in J is justified by a formula in £ and a set 
of tuples from /. A target instance J that satisfies the 
above property is called a closed-world solution for / un- 
der E ||28l . In ||29l , Libkin and Sirangelo propose the 
language of CQ-SkSTDs, that slightly extends the syn- 
tax of SO-tgds, and study the composition problem under 
the closed-world semantics for mappings given by sets of 
CQ-SkSTDs. Due to the lack of space, we do not give 
here the formal definition of the closed-world semantics, 
but instead we give an example that shows the intuition 
behind it (see [29| for a formal definition of the semantics 
and of CQ-SkSTDs). 

Example 3.8. Let a be the SO-tgd of Example l3~6l For- 
mula a is also a CQ-SkSTD [29|. Consider now a source 



instance I such that Takes 7 = {(Chris, logic)}, and the 
instances J\ and J2 such that: 

Enrollment' 71 = {(075, logic)} 
Enrollment'' 2 = {(075, logic), (084, algebra)} 

Notice that both (/, J\) and (I, J2) satisfy a (consider- 
ing an interpretation for function g such that g (Chris) = 
075). Thus, under the semantics based on logical satis- 
faction [17 1, both J 1 and J2 are solutions for /. The cru- 
cial difference between Ji and J2 is that J2 has an un- 
justified tuple [28 1; tuple (075, logic) is justified by tuple 
(Chris, logic), while (084, algebra) has no justification. In 
fact, Ji is a closed-world solution for / under a, but J 2 is 
not E81I29H . □ 

Given a set E of CQ-SkSTDs from R4 to R 2 , we say 
that A4 is specified by E under the closed-world seman- 
tics, denoted by M = cws(E, Ri, R 2 ), if M = {{I, J) | 
I € Inst(Ri), J S Inst(R2) and J is a closed- world so- 
lution for / under E}. Notice that, as Example |3.8| shows, 
the mapping specified by a formula (or a set of formu- 
las) under the closed-world semantics is different from the 
mapping specified by the same formula but under the se- 
mantics of flTl . Thus, it is not immediately clear whether 
a closure property like the one in Theorem [33] can be di- 
rectly translated to the closed-world semantics. In this 
respect, Libkin and Sirangelo |29| show that the language 
of CQ-SkSTDs is closed under composition. 

Theorem 3.9 (1 29 1) Let M 12 = cws(E 12 , Ri, R 2 ) and 
M23 = cws(E 2 3, R2, R3), where E 12 and E23 are sets of 
CQ-SkSTDs. Then there exists a set E13 o/CQ-SkSTDs 
such that A4\2 o M23 = cws(Ei3, Ri, R3). 

4 Inversion of Schema Mappings 

In the recent years, the problem of inverting schema map- 
pings has received a lot of attention. In particular, the is- 
sue of providing a good semantics for this operator turned 
out to be a difficult problem. Three main proposals for 
inverting mappings have been considered so far in the lit- 
erature: Fagin-inverse [13], quasi-inverse [20 1 and maxi- 
mum recovery Q. In this section, we present and compare 
these approaches. 

Some of the notions mentioned above are only appro- 
priate for certain classes of mappings. In particular, the 
following two classes of mappings are used in this section 
when defining and comparing inverses. A mapping M 
from a schema to a schema R 2 is said to be total if 
dom(TW) — Inst(R!), and is said to be closed-down on 
the left if whenever (/, J) e M and I' C I, it holds that 
(I', J) £ M. 



Furthermore, whenever a mapping is specified by a set 
of formulas, we consider source instances as just contain- 
ing constants values, and target instances as containing 
constants and null values. This is a natural assumption in 
a data exchange context, since target instances generated 
as a result of exchanging data may be incomplete, thus, 
null values are used as place-holders for unknown infor- 
mation. In Section POl we consider inverses for alterna- 
tive semantics of mappings and, in particular, inverses for 
the extended semantics that was proposed in [18] to deal 
with incomplete information in source instances. 

4.1 Fagin-inverse and quasi-inverse 

We start by considering the notion of inverse proposed 
by Fagin in fT3l , and that we call Fagin-inverse in this 
papeo Roughly speaking, Fagin's definition is based on 
the idea that a mapping composed with its inverse should 
be equal to the identity schema mapping. Thus, given a 
schema R, Fagin first defines an identity mapping Id as 
{(Ji,/2) I are instances of R and 1\ C J 2 }. Then 

a mapping M! is said to be a Fagin-inverse of a mapping 
M if M oM' = Id. Notice that Id is not the usual identity 
relation over R. As explained in [ 13'], Id is appropriate as 
an identity for mappings that are total and closed-down 
on the left and, in particular, for the class of mappings 
specified by st-tgds. 

Example 4.1. Let Ai be a mapping specified by st-tgds 
S(x) -4 U(x) and S(x) -s> V(x). Intuitively, M is 
Fagin-invertible since all the information in the source re- 
lation S is transferred to both relations U and V in the tar- 
get. In fact, the mapping Ai' specified by ts-tgd U(x) — > 
S(x) is a Fagin-inverse of Ai since Ai o Ai' = Id. More- 
over, the mapping M" specified by ts-tgd V(x) — > S(x) 
is also a Fagin-inverse of Ai, which shows that there need 
not be a unique Fagin-inverse. □ 

A first fundamental question about any notion of in- 
verse is for which class of mappings is guaranteed to ex- 
ist. The following example from iTPSl shows that Fagin- 
inverses are not guaranteed to exist for mappings specified 
by st-tgds. 

Example 4.2. Let M. be a mapping specified by st-tgd 
S(x,y) — > T{x). Intuitively, M. has no Fagin-inverse 
since M. only transfers the information about the first 
component of S. In fact, it is formally proved in |[T3l that 
this mapping is not Fagin-invertible. □ 



'Fagin [13] named his notion just as inverse of a schema mapping. 
Since we are comparing different semantics for the inverse operator, we 
reserve the term inverse to refer to this operator in general, and use the 
name Fagin-inverse for the notion proposed in 1131 . 



As pointed out in ll20l . the notion of Fagin-inverse is 
rather restrictive as it is rare that a schema mapping pos- 
sesses a Fagin-inverse. Thus, there is a need for weaker 
notions of inversion, which is the main motivation for the 
introduction of the notion of quasi-inverse of a schema 
mapping in [20|. 

The idea behind quasi-inverses is to relax the notion 
of Fagin-inverse by not differentiating between source in- 
stances that have the same space of solutions. More pre- 
cisely, let M be a mapping from a schema Ri to a schema 
R.2. Instances I\ and I 2 of Ri are data-exchange equiv- 
alent w.r.t. AL denoted by I\ I 2 , if Solx(/i) = 
Solx(/2)- For example, for the mapping M. in Exam- 
ple l4~2l we have that h ~ M I 2 , with h = {5(1, 2)} and 
I 2 = {S(l, 3)}. Then AL is said to be a quasi-inverse of 
A4 if the property Af o AL = Id holds modulo the equiv- 
alence relation ^m- Formally, given a mapping Af from 
R to R, mapping Af[^M, ~A<t] is defined as 

{{h,h) 6 Inst(R) x Inst(R) | exist I[,I 2 with 
h I[, h ~M 4 and (I[,I 2 ) G Af} 

Then a mapping At' is said to be a quasi-inverse of a map- 
ping M if (MoM')[~ M ,~m] = 
Example 4.3. Let M be a mapping specified by st-tgd 
S(x,y) — > T{x). It was shown in Example 14.21 that 
A4 does not have a Fagin-inverse. However, mapping 
Al' specified by ts-tgd T(x) — > 3yS(x,y) is a quasi- 
inverse of A4 1 20]. Notice that for the source instance 
h = {5(1, 2)}, we have that h and J 2 = {5(1, 3)} are 
both solutions for I\ under the composition Ai o AL. In 
fact, for every / such that / ^ M J lj we have that / is a 
solution for 1\ under Af o Al'. □ 

In 1 20 1, the authors show that if a mapping Ai is Fagin- 
invertible, then a mapping AL is a Fagin-inverse of Ai 
if and only if AL is a quasi-inverse of AL Example l4.3l 
shows that the opposite direction does not hold. Thus, the 
notion of quasi-inverse is a strict generalization of the no- 
tion of Fagin-inverse. Furthermore, the author provides in 
ll20l a necessary and sufficient condition for the existence 
of quasi-inverses for mappings specified by st-tgds, and 
use this condition to show the following result: 

Proposition 4.4 ((20)) There is a mapping Ai specified 
by a single st-tgd that has no quasi-inverse. 

Thus, although numerous non-Fagin-invertible schema 
mappings possess natural and useful quasi-inverses ||20| . 
there are still simple mappings specified by st-tgds that 
have no quasi-inverse. This leaves as an open problem the 
issue of finding an appropriate notion of inversion for st- 
tgds, and it is the main motivation for the introduction of 
the notion of inversion discussed in the following section. 



4.2 Maximum recovery 

We consider now the notion of maximum recovery intro- 
duced by Arenas et al. in (4). In that paper, the authors 
follow a different approach to define a notion of inversion. 
In fact, the main goal of JU is not to define a notion of in- 
verse mapping, but instead to give a formal definition for 
what it means for a mapping AL to recover sound infor- 
mation with respect to a mapping AL Such a mapping 
Ai' is called a recovery of Ai in [4|. Given that, in gen- 
eral, there may exist many possible recoveries for a given 
mapping, Arenas et al. introduce an order relation on re- 
coveries in (4), and show that this naturally gives rise to 
the notion of maximum recovery, which is a mapping that 
brings back the maximum amount of sound information. 

Let A4 be a mapping from a schema Ri to a schema 
R2, and Id the identity schema mapping over Ri, that is, 
Id = {(I, I) I I e Inst(Ri)}. When trying to invert M, 
the ideal would be to find a mapping A4' from R2 to Ri 
such that A4 o A4' = Id. Unfortunately, in most cases this 
ideal is impossible to reach (for example, for the case of 
mappings specified by st-tgds [13]). If for a mapping A4, 
there is no mapping A4i such that A4 o A4± = Id, at least 
one would like to find a schema mapping A4 2 that does 
not forbid the possibility of recovering the initial source 
data. This gives rise to the notion of recovery proposed 
in H. Formally, given a mapping A4 from a schema 
Ri to a schema R2, a mapping M! from R2 to Ri is 
a recovery of A4 if (I, I) € M. o M! for every instance 
I e dom(X) g). 

In general, if Al' is a recovery of A4, then the smaller 
the space of solutions generated by A4 o M.' , the more 
informative A4' is about the initial source instances. This 
naturally gives rise to the notion of maximum recovery; 
given a mapping A4 and a recovery A4' of it, A4' is said 
to be a maximum recovery of A4 if for every recovery A4" 
of AL it holds that MojM'CMo M" E). 

Example 4.5. In [20 1, it was shown that the schema map- 
ping A4 specified by st-tgd 

E(x, z) A E(z, y) -)• F(x, y) A M (z) 

has neither a Fagin-inverse nor a quasi-inverse. However, 
it is possible to show that the schema mapping AL speci- 
fied by ts-tgds: 

F(x,y) -)■ 3u (E(x, u) A E(u, y)), 
M(z) -)■ 3v3w(E(v,z) AE(z,w)), 

is a maximum recovery of A4 . Notice that, intuitively, the 
mapping AL is making the best effort to recover the initial 
data transferred by A4 . □ 



In [4|, Arenas et al. study the relationship between the 
notions of Fagin-inverse, quasi-inverse and maximum re- 
covery. It should be noticed that the first two notions 
are only appropriate for total and closed-down on the left 
mappings fl3l |4l. Thus, the comparison in [|4] focus on 
these mappings. More precisely, it is shown in [4] that 
for every mapping Ai that is total and closed-down on 
the left, if Ai is Fagin-invertible, then Ai' is a Fagin- 
inverse of Ai if and only if Ai' is a maximum recov- 
ery of Ai. Thus, from Example 14.51 one can conclude 
that the notion of maximum recovery strictly generalizes 
the notion of Fagin-inverse. The exact relationship be- 
tween the notions of quasi-inverse and maximum recov- 
ery is a bit more involved. For every mapping Ai that is 
total and closed-down on the left, it is shown in [4] that if 
Ai is quasi-invertible, then Ai has a maximum recovery 
and, furthermore, every maximum recovery of Ai is also 
a quasi-inverse of Ai. 

In [4 1, the authors provide a necessary and sufficient 
condition for the existence of a maximum recovery. It is 
important to notice that this is general condition as it can 
be applied to any mapping, as long as it is defined as a 
set of pairs of instances. This condition is used in J4] to 
prove that every mapping specified by a set of st-tgds has 
a maximum recovery. 

Theorem 4.6 (| 4 1) Every mapping Ai specified by a fi- 
nite set of st-tgds has a maximum recovery. 

4.3 Inverses for alternative semantics 

When mappings are specified by sets of logical formu- 
las, we have considered the usual semantics of mappings 
based on logical satisfaction. However, some alternative 
semantics have been considered in the literature, such as 
the closed world semantics 11281 . the universal seman- 
tics 03], and the extended semantics ff8l . Although some 
of the notions of inverse discussed in the previous sections 
can be directly applied to these alternative semantics, the 
positive and negative results on the existence of inverses 
need to be reconsidered in these particular cases. In this 
section, we focus on this problem for the universal and 
extended semantics of mappings. 

4.3.1 Universal solutions semantics 

Recall that a homomorphism from an instance J\ to an in- 
stance J 2 is a function h : dom(Ji) — > dom( J 2) such that 
(1) h(c) = c for every constant c e dom( Ji), and (2) for 
every fact R(a\, . . . , a*;) in Ji, fact R(h(a{), . . . , h{ak)) 
is in J 2- Given a mapping Ai and a source instance /, a 
target instance J G Sol^ (I) is a universal solution for 



/ under Ai if for every J' £ So1.m(J), there exists a 
homomorphism from J to J'. It was shown in ifPfl \T5l 
that universal solutions have several desirable properties 
for data exchange. In view of this fact, an alternative se- 
mantics based on universal solutions was proposed in lfl5ll 
for schema mappings. Given a mapping Ai, the mapping 
u(Ai) is defined as the set of pairs 

{(/, J) I J is a universal solution for I under Ai}. 

Mapping u(M) was introduced in lfT31l in order to give 
a clean semantics for answering target queries after ex- 
changing data with mapping Ai. By combining the re- 
sults on universal solutions for mappings given by st-tgds 
in lfl4ll and the results in [5] on the existence of maximum 
recoveries, one can easily prove the following: 

Proposition 4.7 Let Ai be a mapping specified by a set of 
st-tgds. Then u{Ai) has a maximum recovery. Moreover, 
the mapping (u(Ai))^ 1 = {(J, I) | (I, J) € u(A4)} is a 
maximum recovery ofu(Ai). 

4.3.2 Extended solutions semantics 

A more delicate issue regarding the semantics of map- 
pings was considered in lfl8l . In this paper, Fagin et 
al. made the observation that almost all the literature about 
data exchange and, in particular, the literature about in- 
verses of schema mappings, assume that source instances 
do not have null values. Since null values in the source 
may naturally arise when using inverses of mappings to 
exchange data, the authors relax the restriction on source 
instances allowing them to contain values in C U N. In 
fact, the authors go a step further and propose new refined 
notions for inverting mappings that consider nulls in the 
source. In particular, they propose the notions of extended 
inverse, and of extended recovery and maximum extended 
recovery. In this section, we review the definitions of the 
latter two notions and compare them with the previously 
proposed notions of recovery and maximum recovery. 

The first observation to make is that since null values 
are intended to represent missing or unknown information, 
they should not be treated naively as constants ll26l . In 
fact, as shown in fl8l , if one treats nulls in that way, the 
existence of a maximum recovery for mappings given by 
st-tgds is no longer guaranteed. 

Example 4.8. Consider a source schema {S(-, •)} where 
instances may contain null values, and let Ai be a map- 
ping specified by st-tgd S(x, y) — > 3z (T (x , z) AT (z , y)) . 
Then Ai has no maximum recovery if one considers a 
naive semantics where null elements are used as constants 
in the source lfl8l . □ 



Since nulls should not be treated naively when ex- 
changing data, in ifTHll the authors proposed a new way 
to deal with null values. Intuitively, the idea in [18| is 
to close mappings under homomorphisms. This idea is 
supported by the fact that nulls are intended to represent 
unknown data, thus, it should be possible to replace them 
by arbitrary values. Formally, given a mapping Ai, define 
e(Ai), the homomorphic extension of Ai, as the mapping: 

{(7, J) | 3(1', J 1 ) : (/', J') e Ai and there exist 

homomorphisms from / to I' and from J' to J}. 

Thus, for a mapping Ai that has nulls in source and target 
instances, one does not have to consider Ai but e(Ai) as 
the mapping to deal with for exchanging data and comput- 
ing mapping operators, since e(Ai) treats nulls in a mean- 
ingful way [ 18 ]. The following result shows that with this 
new semantics one can avoid anomalies as the one shown 
in Example l4.8l 

Theorem 4.9 ( 11191 ) For every mapping Ai specified by a 
set of st-tgds and with nulls in source and target instances, 
e(Ai) has a maximum recovery. 

As mentioned above, Fagin et al. go a step further in 
1 18 1 by introducing new notions of inverse for mappings 
that consider nulls in the source. More specifically, a 
mapping Ai' is said to be an extended recovery of Ai 
if (1,1) G e(Ai) o e(Ai'), for every source instance I. 
Then given an extended recovery Ai' of Ai, the map- 
ping Ai' is said to be a maximum extended recovery of 
Ai if for every extended recovery Ai" of Ai, it holds that 
e(M) o e(M') C e(M) o e(M") H3. 

At a first glance, one may think that the notions of max- 
imum recovery and maximum extended recovery are in- 
comparable. Nevertheless, the next result shows that there 
is a tight connection between these two notions. In par- 
ticular, it shows that the notion proposed in |18| can be 
defined in terms of the notion of maximum recovery. 

Theorem 4.10 A mapping Ai has a maximum extended 
recovery if and only if e(Ai) has a maximum recovery. 
Moreover, Ai' is a maximum extended recovery of Ai if 
and only if e(Ai') is a maximum recovery of e(Ai). 

In ifTHl . it is proved that every mapping specified by a set 
of st-tgds and considering nulls in the source has a max- 
imum extended recovery. It should be noticed that this 
result is also implied by Theorems |4.9| and |4. 101 

Finally, another conclusion that can be drawn from the 
above result is that, all the machinery developed in [4, 5 1 
for the notion of maximum recovery can be applied over 
maximum extended recoveries, and the extended seman- 
tics for mappings, thus giving a new insight about inverses 
of mappings with null values in the source. 



4.4 Computing the inverse 

Up to this point, we have introduced and compared three 
notions of inverse proposed in the literature, focusing 
mainly on the fundamental problem of the existence of 
such inverses. In this section, we study the problem of 
computing these inverses. More specifically, we present 
some of the algorithms that have been proposed in the lit- 
erature for computing them, and we study the languages 
used in these algorithms to express these inverses. 

Arguably, the most important problem to solve in this 
area is the problem of computing inverses of mappings 
specified by st-tgds. This problem has been studied for the 
case of Fagin-inverse l20l |2T1 . quasi-inverse [20], maxi- 
mum recovery flU [3] [3] and maximum extended recovery 
lfl8l [T9l . In this section, we start by presenting the algo- 
rithm proposed in for computing maximum recoveries 
of mappings specified by st-tgds, which by the results of 
Sections [4.11 and l4~2l can also be used to compute Fagin- 
inverses and quasi-inverses for this class of mappings. In- 
terestingly, this algorithm is based on query rewriting, 
which greatly simplifies the process of computing such 
inverses. 

Let Ai be a mapping from a schema Ri to a schema 
R.2 and Q a query over schema R2. Then a query Q' 
is said to be a rewriting of Q over the source if Q' is a 
query over Ri such that for every I £ Inst(Ri), it holds 
that Q'(I) = certain K ^(Q, I). That is, to obtain the set 
of certain answers of Q over I under Ai, one just has to 
evaluate its rewriting Q' over instance I. 

The computation of a rewriting of a conjunctive query 
is a basic step in the first algorithm presented in this sec- 
tion. This problem has been extensively studied in the 
database area BTl [32l [TT1 Q] [381 and, in particular, in the 
data integration context [24, 23 30 1. The following algo- 
rithm uses a query rewriting procedure QueryRewrit- 
ING to compute a maximum recovery of a mapping Ai 
specified by a set E of st-tgds. In the algorithm, if 
x = (x\, . . . , Xk), then C(x) is a shorthand for C(a;i) A 
••• AC(x fc ). 

Algorithm MaximumRecovery(A^) 
Input: M = (S, T, E), where E is a set of st-tgds. 
Output: Ai' = (T,S,E'), where E' is a set of 
CQ c -TO-UCQ~ ts-dependencies and Ai' is a maximum 
recovery of Ai. 

1. Start with E' as the empty set. 

2. For every dependency of the form tp(x) — > 3ytp(x,y) 
in E, do the following: 

(a) Let Q be the query defined by 3yip(x,y). 

(b) Use QueryRewriting(.M, Q) to compute a for- 
mula a(x) in UCQ - that is a rewriting of 3yip(x,y) 
over the source. 



(c) Add dependency 3yip(x, y) A C(x) — > a(x) to E'. 
3. Return M' = (T, S, E'). □ 

Theorem 4.11 ([4, 5]) Let M = (S,T,E), where E 
is a set of st-tgds. Then MaximumRecovery( j M) 
computes a maximum recovery of M. in exponential 
time in the size of E, which is specified by a set of 
CQ C -TO-UCQ _ dependencies. Moreover, if M. is 
Fagin-invertible (quasi-invertible), then the output of 
MAXIMUMRECOVERY(.M) is a Fagin-inverse (quasi- 
inverse) of M., 

It is important to notice that the algorithm Maximum- 
Recovery returns a mapping that is a Fagin-inverse of 
an input mapping M whenever M. is Fagin-invertible, but 
it does not check whether Ai indeed satisfies this condi- 
tion (and likewise for the case of quasi-inverse). In fact, it 
is not immediately clear whether the problem of checking 
if a mapping given by a set of st-tgds has a Fagin-inverse 
is decidable. In lETl . the authors solve this problem show- 
ing the following: 

Theorem 4.12 ([21 1) The problem of verifying whether a 
mapping specified by a set of st-tgds is Fagin-invertible is 
coNP -complete. 

Interestingly, it is not known whether the previous prob- 
lem is decidable for the case of the notion of quasi-inverse. 

One of the interesting features of algorithm Maxi- 
MUMRecovery is the use of query rewriting, as it al- 
lows to reuse in the computation of an inverse the large 
number of techniques developed to deal with the problem 
of query rewriting. However, one can identify two draw- 
backs in this procedure. First, algorithm MaximumRe- 
COVERY returns a mapping that is specified by a set of 
CQ c -TO-UCQ = dependencies. Unfortunately, this type 
of mappings are difficult to use in the data exchange con- 
text. In particular, it is not clear whether the standard 
chase procedure could be used to produce a single canoni- 
cal target database in this case, thus making the process of 
exchanging data and answering queries much more com- 
plicated. Second, the output mapping of MaximumRe- 
COVERY can be of exponential size in the size of the input 
mapping. Thus, a natural question at this point is whether 
simpler and smaller inverse mappings can be computed. 
In the rest of this section, we show some negative results 
in this respect, and also some efforts to overcome these 
limitations by using more expressive mapping languages. 

The languages needed to express Fagin-inverses and 
quasi-inverses are investigated in ||20ll2Tl . In the respect, 
the first negative result proved in [20| is that there ex- 
ist quasi-invertible mappings specified by st-tgds whose 



quasi-inverse cannot be specified by st-tgds. In fact, it is 
proved in |20| that the quasi-inverse of a mapping given 
by st-tgds can be specified by using CQ^' -TO-UCQ de- 
pendencies, and that inequality, predicate C( ) and dis- 
junction are all unavoidable in this language in order to 
express such quasi-inverse. For the case of Fagin-inverse, 
it is shown in ll20l that disjunctions are not needed, that 
is, the class of CQ^' C -TO-CQ dependencies is expres- 
sive enough to represent the Fagin-inverse of a Fagin- 
invertible mapping specified by a set of st-tgds. In 
lTT~3l |2"D . it is proved a second negative result about the 
languages needed to express Fagin-inverses, namely that 
there is a family of Fagin-invertible mappings M. speci- 
fied by st-tgds such that the size of every Fagin-inverse of 
A4 specified by a set of CQ^' c -TO-CQ dependencies is 
exponential in the size of M.. Similar results are proved 
in 0I5] for the case of maximum recoveries of mappings 
specified by st-tgds. More specifically, it is proved in [4] 
that the maximum recovery of a mapping given by st-tgds 
can be specified by using CQ c -TO-UCQ~ dependencies, 
and that equality, predicate C( ) and disjunction are all 
unavoidable in this language in order to express such max- 
imum recovery. Moreover, it is proved in [5 1 that there is 
a family of mappings A4 specified by st-tgds such that 
the size of every maximum recovery of M. specified by 
a set of CQ c -TO-UCQ~ dependencies is exponential in 
the size of M.. 



In view of the above negative results, Arenas et al. ex- 
plore in [3 1 the possibility of using a more expressive 
language for representing inverses. In particular, they 
explore the possibility of using some extensions of the 
class of SO-tgds to express this operator. In fact, Are- 
nas et al. provide in [3| a polynomial-time algorithm that 
given a mapping M specified by a set of st-tgds, returns 
a maximum recovery of M., which is specified in a lan- 
guage that extends SO-tgds (see 1 3 1 for a precise defini- 
tion of this language). It should be noticed that the algo- 
rithm presented in Q was designed to compute maximum 
recoveries of mappings specified in languages beyond 
st-tgds, such as the language of nested mappings 11221 
and plain SO-tgds (see Section [5] for a definition of the 
class of plain SO-tgds). Thus, the algorithm proposed 
in |f3l can also be used to compute in polynomial time 
Fagin-inverses (quasi-inverses) of Fagin-invertible (quasi- 
invertible) mappings specified by st-tgds, nested map- 
pings and plain SO-tgds. Interestingly, a similar approach 
was used in |[T9l to provide a polynomial-time algorithm 
for computing the maximum extended recovery for the 
case of mappings defined by st-tgds. 



5 Query-based notions of composi- 
tion and inverse 

As we have discussed in the previous sections, to ex- 
press the composition and the inverse of schema mappings 
given by st-tgds, one usually needs mapping languages 
that are more expressive than st-tgds, and that do not have 
the same good properties for data exchange as st-tgds. 

As a way to overcome this limitation, some weaker no- 
tions of composition and inversion have been proposed in 
the recent years, which are based on the idea that in prac- 
tice one may be interested in querying exchanged data by 
using only a particular class of queries. In this section, we 
review these notions. 

5.1 A query-based notion of composition 

In this section, we study the notion of composition 
w.r.t. conjunctive queries (CQ-composition for short) in- 
troduced by Madhavan and Halevy [33). This semantics 
for composition can be defined in terms of the notion of 
conjunctive-query equivalence of mappings that was in- 
troduced in 1331 for studying CQ-composition and gen- 
eralized in [16 1 when studying optimization of schema 
mappings. Two mappings Ai and Ai' from S to T are 
said to be equivalent w.r.t. conjunctive queries, denoted 
by Ai =cq M.', if for every conjunctive query Q, the set 
of certain answers of Q under Ai coincides with the set of 
certain answers of Q under Ai'. Formally, Ai =cq Ai' if 
for every conjunctive query Q over T and every instance 
/ of S, it holds that certain M (Q, I) — certain M , (Q, I). 
Then CQ-composition can be defined as follows: Aiz is a 
CQ-composition of Ai\ and M2 if Ai$ =cq Ai\ o Ai2- 
A fundamental question about the notion of CQ- 
composition is whether the class of st-tgds is closed under 
this notion. This problem was implicitly studied by Fagin 
et al. |fl6l in the context of schema mapping optimiza- 
tion. In 1 16 1, the authors consider the problem of whether 
a mapping specified by an SO-tgd is CQ-equivalent to a 
mapping specified by st-tgds. Thus, given that the com- 
position of a finite number of mappings given by st-tgds 
can be defined by an SO-tgd [17], the latter problem is 
a reformulation of the problem of testing whether st-tgds 
are closed under CQ-composition. In fact, by using the 
results and the examples in ifTBI . one can easily construct 
mappings Ai 1 and M 2 given by st-tgds such that the CQ- 
composition of Ai\ and M2 is n °t definable by a finite set 
of st-tgds. 

A second fundamental question about the notion of 
CQ-composition is what is the right language to express 
it. Although this problem is still open, in the rest of this 



section we shed light on this issue. By the results in ifTTl . 
we know that the language of SO-tgds is enough to rep- 
resent the CQ-composition of st-tgds. However, as moti- 
vated by the following example, some features of SO-tgds 
are not needed to express the CQ-composition of map- 
pings given by st-tgds. 

Example 5.1. (from [ 17j) Consider a schema Ri consist- 
ing of one unary relation Emp that stores employee names, 
a schema R2 consisting of a binary relation Mg^ that as- 
signs a manager to each employee, and a schema R3 con- 
sisting of a binary relation Mgr intended to be a copy of 
Mgri and of a unary relation Self Mgr, that stores em- 
ployees that are manager of themselves. Consider now 
mappings Ai\2 and M23 specified by the following sets 
of st-tgds: 

£12 = { Emp(e) — > 3m Mgr x (e, m) }, 
£23 = {Mgr 1 (e,m) —¥ Mgr(e, m), 

Mgr 1 (e,e) — > SelfMgr(e) }. 

Mapping Ai 12 intuitively states that every employee must 
be associated with a manager. Mapping .M23 requires 
that a copy of every tuple in Mgr 1 must exists in Mgr, 
and creates a tuple in Se 1 f Mgr whenever an employee is 
the manager of her/himself. It was shown in [ 17] that the 
mapping Ai 13 given by the following SO-tgd: 

3/(Ve(Emp(e) -> Mgr(e, /(e)))A 

Ve(Emp(e) A e = /(e) SelfMgr(e))) (2) 

represents the composition Ai\ 2 o A^23- Moreover, the 
authors prove in [17| that the equality in the above for- 
mula is strictly necessary to represent that composition. 
However, it is not difficult to prove that the mapping Ai' 13 
given by the following formula: 

3/(Ve(Emp(e) -> Mgr(e, /(e)))) (3) 

is CQ-equivalent to Aii 3 , and thus, Ai' 13 is a CQ- 
composition of Ain and ^23- D 

We say that formula (01 is a plain SO-tgd. Formally, a 
plain SO-tgd from Ri to R2 is an SO-tgd satisfying the 
following restrictions: (1) equality atoms are not allowed, 
and (2) nesting of functions is not allowed. Notice that, 
just as SO-tgds, this language is closed under conjunction 
and, thus, we talk about a mapping specified by a plain 
SO-tgd (instead of a set of plain SO-tgds). The following 
result shows that even though the language of plain SO- 
tgds is less expressive than the language of SO-tgds, they 
are equally expressive in terms of CQ-equivalence. 

Lemma 5.2 For every SO-tgd a, there exists a plain SO- 
tgd a' such that a =cq 



It is easy to see that every mapping specified by a set 
of st-tgds can be specified with a plain SO-tgd. Moreover, 
the following theorem shows that this language is closed 
under CQ-composition, thus showing that this class of de- 
pendencies has good properties within the framework of 
CQ-equivalence. 

Theorem 5.3 Let M.\2 and M23 be mappings specified 
by plain SO-tgds. Then the CQ-composition of M.\i and 
M23 can be specified with a plain SO-tgd. 

Thus, the CQ-composition of a finite number of map- 
pings, each specified by a set of st-tgds, is definable by 
a plain SO-tgd. It should be noticed that Theorem l5.3l is 
a consequence of Lemma fOl and the fact that the class of 
SO-tgds is closed under composition ifTTIl . 

Besides the above mentioned results, the language of 
plain SO-tgds also has good properties regarding inver- 
sion. In particular, it is proved in [ 3 ] that every plain SO- 
tgd has a maximum recovery, and, moreover, it is given 
in that paper a polynomial-time algorithm to compute it. 
Thus, it can be argued that this class of dependencies is 
more suitable for inversion than SO-tgds, as there exist 
SO-tgds that do not admit maximum recoveries. 

5.2 A query-based notion of inverse 

In O, the authors propose an alternative notion of inverse 
by focusing on conjunctive queries. In particular, the au- 
thors first define the notion of CQ-recovery as follows. A 
mapping A4' is a CQ-recovery of M if for every instance 
/ and conjunctive query Q, it holds that 

certain M „ M ,(Q, I) C Q(I). 

Intuitively, this equation states that M! recovers sound in- 
formation for M. w.r.t. conjunctive queries since for every 
instance /, by posing a conjunctive query Q against the 
space of solutions for / under A4 o M', one can only re- 
cover data that is already in the evaluation of Q over /. 
A CQ-maximum recovery is then defined as a mapping 
that recovers the maximum amount of sound information 
w.r.t. conjunctive queries. Formally, a CQ-recovery M' 
of M. is a CQ-maximum recovery of M. if for every other 
CQ-recovery M" of M, it holds that 

certain h/[r , k/l „(QJ) C certain M „ M ,{QJ), 

for every instance / and conjunctive query Q. 

In [3], the authors study several properties about CQ- 
maximum recoveries. In particular, they provide an al- 
gorithm to compute CQ-maximum recoveries for st-tgds 
showing the following: 



Theorem 5.4 (| 3 1) Every mapping specified by a set of st- 
tgds has a CQ-maximum recovery, which is specified by a 
set of CQ ' w -TO-CQ dependencies. 

Notice that the language needed to express CQ-maximum 
recoveries of st-tgds has the same good properties as st- 
tgds for data exchange. In particular, the language is 
chaseable in the sense that the standard chase procedure 
can be used to obtain a canonical solution. Thus, com- 
pared to the notions of Fagin-inverse, quasi-inverse, and 
maximum recovery, the notion of CQ-maximum recovery 
has two advantages: (1) every mapping specified by st- 
tgds has a CQ-maximum recovery (which is not the case 
for Fagin-inverses and quasi-inverses), and (2) such re- 
covery can be specified in a mapping language with good 
properties for data exchange (which is not the case for 
quasi-inverses and maximum recovery). 

In [3 ], the authors also study the minimality of the lan- 
guage used to express CQ-maximum recoveries, showing 
that inequalities and predicate C( ) are both needed to ex- 
press the CQ-maximum recoveries of mappings specified 
by st-tgds. 

6 Future Work 

As many information-system problems involve not only 
the design and integration of complex application arti- 
facts, but also their subsequent manipulation, the defini- 
tion and implementation of some operators for metadata 
management has been identified as a fundamental issue to 
be solved Q . In particular, composition and inverse have 
been identified as two of the fundamental operators to be 
studied in this area, as they can serve as building blocks 
of many other operators ll34l [36l . In this paper, we have 
presented some of the results that have been obtained in 
the recent years about the composition and inversion of 
schema mappings. 

Many problems remain open in this area. Up to now, 
XML schema mapping languages have been proposed and 
studied [6, 2, 39], but little attention has been paid to the 
formal study of XML schema mapping operators. For the 
case of composition, a first insight has been given in Q, 
showing that the previous results for the relational model 
are not directly applicable over XML. Inversion of XML 
schema mappings remains an unexplored field. 

Regarding the relational model, we believe that the fu- 
ture effort has to be focused in providing a unifying frame- 
work for these operators, one that permits the successful 
application of them. A natural question, for instance, is 
whether there exists a schema mapping language that is 
closed under both composition and inverse. Needless to 



say, this unified framework will permit the modeling of 
more complex algebraic operators for schema mappings. 
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A Proofs and Intermediate Results 



In this section, we provide proofs for the new results reported in this survey. Some of these proofs are related with the 
notion of maximum recovery proposed in H, and its relationship with some other notions of inverse. The main tool 
used in this section regarding maximum recoveries is described in the following proposition. 

Proposition A.l (|4|) Ai' is a maximum recovery of Ai if and only if Ai' is a recovery of Ai and for every (I\ , I2) G 
Mo M 1 , it holds that So\ M (h) C Sol^ (h)- 

It should be noticed that the above is a characterization of the notion of maximum recovery for a mapping Ai that is 
total, that is, if Ai is a mapping from Ri to R2, then dom(A^) = Inst(Ri)@ 

A.l Proof of Proposition 14.71 

Let Ai be a mapping specified by a set of st-tgds. We know by [ 14 1 that every source instance has a universal solution 
under Ai, and, thus, u(Ai) is a total mapping. Next we show that u(Ai) and u(TW)" 1 satisfy the condition of 
Proposition lA.il which implies that u(Ai)^ 1 is a maximum recovery of u{Ai). 

It is straightforward to show that u(Ai)~ 1 is a recovery of u(Ai) and, hence, it only remains to prove that for every 
tuple (Ii, 7 2 ) G u(M) o it holds that: 

Sol u (A4)(/ 2 ) Q Sol„ (jM) (/i). 

Assume that (Ji,^) G u(Ai) o u(Ai)~ l . Then there exists a target instance J such that (It, J) G u{M) and 
(J, I2) G u(Ai)~ l . Thus, given that every solution in u(Ai) is a universal solution in Ai, we have that J is a universal 
solution for both instances Ii and J 2 in Ai. Hence, we have by Proposition 2.6 in [14] that SoLvi(/i) = SoLxfJg) 
and, thus, Sol u (^)(/ 2 ) Q Sol u (^)(li), which was to be shown. 

Proof of Proposition 14.101 

We first introduce some notation to simplify the exposition. Let I\ and I2 be instances of the same schema R with 
values in C U N. Recall that a homomorphism from I\ to I2 is a function h : dom(ii) — > dom(/2) such that, for 
every constant value a 6 C, it holds that h(a) = a, and for every R G R and every tuple (01, . . . , ak) £ R Tl , it holds 
(h(ai), . . . , h(a,k)) 6 R l2 ■ Consider a binary relation defined as follows: 

— > = {(Ii, Ii) I there exists a homomorphism from 1\ to 

In (TS I, relation — > was introduced to simplify the definition of the extended semantics of a mapping. In fact, given a 
mapping Ai, we have that 

e(M) = -^oMo^. 

Notice that the relation — > is idempotent, that is, it holds that (— > o — >) = — h In particular, we have that 

^oe(M) = e(M), (4) 
e(M)o-> = e(M). (5) 

Thus, if Ii, I 2 , J are instances such that (Ii^h) G — > and {I2,J) G e(A / (), then {I\,J) G e(A / (). Hence, if 
^2) G — >, then it holds that So\ e ^M){l2) Q Sol e (^4) (h). We use this property in this proof. 
Before proving the proposition, we make an additional observation. The extended recovery of a mapping M. is 
defined in [ 18 1 only for the case when the domain of e(A4) is the set of all source instances. More precisely, a mapping 
Ai' is said to be an extended recovery of Ai in ||T8l if for every source instance I, it holds that (J, J) G e(Ai) oe(Ai'). 



2 In (4), the authors provide more general characterizations for mappings that are not necessarily total by considering the notion of reduced 
recovery. 



Thus, it is only meaningful to compare the notions of (maximum) extended recovery and (maximum) recovery for the 
class of mappings Ai such that e(Af) is the set of all source instances. For this reason, if Ai is a mapping from a 
schema Ri to a schema R2, then we assume in this proof that dom(e(Al)) = Inst(Ri). It should be noticed that this 
implies by Proposition lA. 1 I that: 

Ai' is a maximum recovery of e(Ai) 
if and only if 

Ai' is a recovery of e(Ai) and for every (7i,7 2 ) £ e{Ai) o Ai', it holds that Sol e (^)(7 2 ) C Sol e (jvi)(7i). 
We extensively use this property in this proof. 

Now we are ready to prove Proposition 14.101 Let Ai be a mapping from a schema S to a schema T, and assume 
that source instances are composed by null and constant values. We first show that e(Ai) has a maximum recovery if 
and only if Ai has a maximum extended recovery. 

(=>■) Assume that e(Ai) has a maximum recovery, and let Ai' be a maximum recovery of e(Ai). We show next that 
At' is also a maximum extended recovery of Ai. Since Ai' is a recovery of e(Ai), we have that (7, 7) € e(AI) o Ai' 
for every instance 7 of S. Moreover, from (O we have that e(Ai) o Ai' = e(A() o — > o AI' and, thus, (7, 7) G 
e(A() o — s> o Ai' for every instance 7 of S. Thus, given that (7, 7) G — > for every instance 7 of S, we obtain that 
(I, I) G e(A4) 0—^0 Ai' o — >■ = e(Al) o e(AI') for every instance 7 of S, which implies that Ai' is an extended 
recovery of Ai. 

Now, let Ai" be an extended recovery of Ai. Then, as above, we obtain that (7, 7) 6 e(Ai) o e(Al") for every 
instance 7 of S. Thus, we have that e(Ai") is a recovery of e(Ai). Recall that Ai' is a maximum recovery of e(Ai) 
and, hence, we have that e(M) o Af' C e(Al) o e(Al"), which implies that e(Ai) o Al'o — >• C e(Al) o e(Al") o -K 
Therefore, given that e(A!) = e(Al) o — > and e(Ai") o — > = e(Af") by ©, we have that e(Ai) o — > o 04C 
e(Ai) o e(Ai"), which implies that e(.M) o e(jM') C e(Ai) o e(Ai"). Thus, we have shown that Ai' is an extended 
recovery of Ai, and that for every other extended recovery Ai" of Ai, it holds that e(Ai) o e(At') C e(A^) o e(A4"), 
which implies that AI' is a maximum extended recovery of Ai. 

(<=) Now assume that Ai has a maximum extended recovery, and let Ai' be a maximum extended recovery of Ai. 
Next we show that e(Ai') is a maximum recovery of e(AJ). 

Given that Ai' is an extended recovery of Ai, we have that (1,1) G e(A / () o e(Af') for every instance I of S, 
which implies that e(Ai') is a recovery of e(Ai). Thus, by Proposition IA.1I to prove that e(Ai') is a maximum 
recovery of e(Ai), it is enough to show that Sol e (^)(/2) C Sol e (x)(/i) for every (h,h) G e(At) ° e(AI'). Let 
(7i,J 2 ) G e(Al) o e(7W'). To prove that Sol e (_A^) (I2) C Sol e (^vt) (7i), we make use of the following mapping Ai* 
from T to S: 

A(* = {(J, 7) I 7 is an instance of S and (7 X , J) i e(M)} U 

{(J, 7) I (7 1; J) G e(At) and Sol e(M) (7) C Sol e(jM) (7 1 )}. 

We show first that Ai* is an extended recovery of Ai, that is, we show that for every instance 7 of S, it holds that 
(7,7) G e(A^)oe(A / (*). First, assume that Sol e (^) (7) C Sol e (x) (7i), and consider an arbitrary instance J* such that 
(7, J*) G e(Ai). Notice that (7i, J*) G e{M) since Sol e (^v()(7) C Sol e (^() (7i). Thus, we have that (7*, 7) G Ai* 
and, hence, {J*, I) G e(M*). Therefore, given that (7, J*) G e(Ai) and (J*, I) G e(A(*), we conclude that 
(7, 7) G e(At) o e(A^*). Second, assume that Sol e ( J vi) (7) ^ Sol e (Ai) (7i). Then there exists an instance J* such that 
(I, J*) G e(Af) and (7 X , J*) f e(M). By definition of Ai*, we have that (J*, 7) G M* and, thus, (J*, I) G e(M*). 
Thus, we also conclude that (7, 7) G e(Ai) o e(Ai*) in this case. 

We are now ready to prove that for every (7i, 1%) G e(Ai) o e(A / ('), it holds that Sol e (^4) (I2) C Sol e (^4) (7i). Let 
(7i, 7 2 ) G e(Al) o e(Ai'). Given that At' is a maximum extended recovery of Ai and Ai* is an extended recovery 
of Ai, we have that e(M) o e(At') C e(A() o e(A(*) and, therefore, (7x,7 2 ) G e(Af) o e(At*). Thus, given that 
e(A()oe(Al*) = e(A()oA / (*o — s> by (O, we conclude that there exist instances J of T and 7 2 of S such that (7i , 7) G 
e(M), (J, I' 2 ) G Ai* and (7 2 , 7 2 ) G — >. Hence, by definition of A(*, we have that Sol e(A ^) (7 2 ) C So\ e ( M ) (7i) (since 
(7i, 7) G e(A/)). But we also have that Sol e (^)(7 2 ) C SoI^m) (7 2 ) since (7 2 , 7 2 ) G — and, therefore, we conclude 
that Sol e (_/n) (7 2 ) C Sol e (_A4) (7l), which was to be shown. 



Up to this point, we have shown that e(M) has a maximum recovery if and only if A4 has a maximum extended 
recovery. In fact, from the preceding proof, we conclude that: 

(a) if e(A4) has a maximum recovery A4', then M! is a maximum extended recovery of A4, and 

(b) if M. has a maximum extended recovery M! , then e(M') is a maximum recovery of e(A4). 

Next we prove the second part of Proposition 14.101 that is, we prove that a mapping M! is a maximum extended 
recovery of M. if and only if e{M') is a maximum recovery of e{M). It should be noticed that the "only if" direction 
corresponds to property (b) above and, thus, we only need to show that if e(M.') is a maximum recovery of e(A4), 
then M' is a maximum extended recovery of M. 

Assume that e(Ai') is a maximum recovery of e{M). Then we have that e(yVf') is a recovery of e(A4) and, thus, 
M! is an extended recovery of M. Now let M." be an extended recovery of M.. Then we have that e(M") is a 
recovery of e(A4) and, hence, e(A4) o e{M!) C e(M.) o e(A4") since e(Ai') is a maximum recovery of e(7W). 
Therefore, we conclude that M! is an extended recovery of M., and for every extended recovery M" of M., it holds 
that e(M.) o e(A^') C e(M.) o e(A4"), which means that M.' is a maximum extended recovery of M.. This completes 
the proof of the proposition. 



